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DETAILED ACTION 

Specification 

1 . The lengthy specification has not been checked to the extent necessary to 
determine the presence of all possible minor errors. Applicant's cooperation is 
requested in correcting any errors of which applicant may become aware in the 
specification. _ , 

Claim Rejections - 35 USC §112 

2. The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

3. Claim 1-38 is rejected under 35 U.S.C. 112, first paragraph, because the 
specification, while being enabling for a partial sums bus and a distributed computing 
environment, does not reasonably provide enablement for calculating partial sums. The 
specification does not enable any person skilled in the art to which it pertains, or with 
which it is most nearly connected, to make and/or use the invention commensurate in 
scope with these claims. Specifically, the specification sets forth convolution 
operations, but does not provide enablement for the claimed language of calculating 
partial sums with the partial sums comprising weights of various portions of samples, 
without actually performing multiplicative operations via matrices to actually perform the 
convolution, in that only addition operations are specifically performed, and the resultant 
combination would not, in fact, meet the recited purpose found in the preamble, that of 
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performing convolution. If applicant believes that this functionality is inherent to the 
claimed system, applicant should provide reasoning to support this conclusion. 

4. The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter, which the applicant regards as his invention. 

5. Claims 1-38 are rejected under 35 U.S.C. 112, second paragraph, as being 

~ indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

6. Where applicant acts as his or her own lexicographer to specifically define a term 
of a claim contrary to its ordinary meaning, the written description must clearly redefine 
the claim term and set forth the uncommon definition so as to put one reasonably skilled 
in the art on notice that the applicant intended to so redefine that claim term. Process 
Control Corp, v. HydReclaim Corp., 190 F.3d 1350, 1357, 52 USPQ2d 1029, 1033 (Fed. 

_ . Cir. 1999). The term "convolution kernel" in claims 1-38 is used by the claim to mean "a 
piece of software operation to perform partial summation and addition operations within 
a piece of hardware in a distributed computing environment", while the accepted 
meaning is "a piece of software running on a hardware platform performing actual, 
mathematical convolution operations inclusive of multiplication." The term is indefinite 
because the specification does not clearly redefine the term. 

7. The term "finely" in claim 3 is a relative term that renders the claim indefinite. 
The term "finely" is not defined by the claim, the specification does not provide a 
standard for ascertaining the requisite degree, and one of ordinary skill in the art would 
not be reasonably apprised of the scope of the invention. That is, the degree of 
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interleaving is completely unknown, as it could easily be anywhere from the sub- 
fragment (e.g. deep sub-pixel) level to that of the screen being split into only two 
regions. If the term "finally" is used in any other claims, those are also indefinite, so 
applicant should amend accordingly. 

8. Claims 5-38 are rejected as being indefinite because of the term "a chain", which 
does not require serial processing per se, which, according to the specification, is the 
intended purpose of the claimed invention and is obviously applicant's intent. Further, 
this would a broader version of claim 1 , which is directed mainly to serial processing, 
and as such the rejection is also for that. If applicant would provide a definition for the 
term 'chain' in the context of these claims that is supported by the specification, that 
would successfully traverse this particular rejection. 

9. Claims 1-7 are rejected as failing to define the invention in the manner required 
by 35 U.S.C. 112, second paragraph, with claims 2-7 being rejected for not correcting 
the deficiencies of the parent claim. 

1 0. The structure that goes to make up the device must be clearly and positively 
specified. The structure must be organized and correlated in such a manner as to 
present a complete operative device. The claim as written does not provide a complete, 
operative device, in that it is a fundamental of the art that convolution operations clearly 
require multiplication, and the applicant's invention is fundamentally performing filtering 
(e.g. digital implementation / approximation of an ideal continuous-time FIR or MR filter), 
which obviously also require multiplication. 



Application/Control Number: 10/673,087 Page 5 

Art Unit: 2672 

1 1 . The definition of the term "memory units" in claim 16 is taken to require multiple 
physical rather than logical memories. If applicant wishes to dispute this definition, 
applicant must make that point clear - the allowability of the claim will be withdrawn if 
applicant intends the meaning to be different logical memories. 

1 2. Examiner invites and encourages applicant to point out sections of the 
specification that may provide evidence that, as opposed to the above rejections under 
35 U.S.C. 112, first and second paragraphs, that the specification is actually a) enabled 
and / or b) definite. 

Claim Rejections - 35 USC § 103 

13. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

14. Claims 1-2 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Cloutier (US 5,892,962) in view of Choi et al (US 5,742,349). 

15. As to claim 1, 

A system for distributed convolution of samples comprising: (Choi 2:10-20 
teaches convolution)(Cloutier 7:26-55) 
-A sample manager operable to: 

-Calculate partial sums for a portion of the samples within a convolution kernel, 
wherein the partial sums comprise 1) a sum of weights determined for locations of the 
samples in the portion of samples and 2) a sum of weighted sample values for the 
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portion of samples, (Choi teaches weighted partial sums are taught in 5:62-6:25, where 
the weights are based on position, e.g. the pixel location per line, which clearly 
comprises a sum of weights determined for each position as explicitly set forth in 6:2- 
6:20, and the result of the weighted values are summed and put back into memory as 
set forth there)(Cloutier clearly establishes in 7:26-55 that the system has many 
processing. element, each of. which computes its own partial sums for convolution 
purposes) 

-Add the partial sums to any previously accumulated partial sums, and (Choi 
5:62-6:25 as set forth above)(Cloutier 7:26-55 again, where it is well known that the 
partial sums must be added, and 4:20-35, where it is taught that the present 
embodiment is well-suited for matrix and vector addition and multiplication. More 
specifically, the embodiment of Cloutier is taught to perform multiply-accumulate 

operations (8:50-9:15), which clearly requires that the network of processing elements 

. . , . . , .. ....... 

perform multiply-accumulate operations per tile (with each processing element 
performing said operations), and in a neural network application, which provides feed- 
forward information (e.g. feedback) for pattern recognition and similar, multiply- 
accumulate operations used in the processing of an image would obviously be added 
and passed along, as the architecture of Cloutier as shown in Fig. 1 is such that data is 
passed along between elements in the additive fashion as set forth above. Also, the 
system performs convolution and uses partial sums, which clearly requires that the 
accumulated partial sums be passed to other elements.) 
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-Output new accumulated partial sums; (Given that the system of Choi produces 
graphical output (1:40-55) for a monitor, it clearly outputs partial sum data that is 
eventually turned into video output; convolution using partially sums prima facie requires 
that such sums be output. )(The same logic, that the convolution of Cloutier in 7:26-55 
and 8:50-9:15 clearly requires that the results of the computation and accumulated 
partial sums be output to. other elements in the array so that computation can continue) 

-A first partial sums bus connected to the sample manager for receiving any 
previously accumulated partial sums; and a (Cloutier Fig. 1, element 114, global bus, is 
clearly connected to the process (SIMD) controller 108, and further there is a program 
control or instruction bus 118 connected to the system 102) 

-A second partial sums bus connected to the sample manager for outputting the 
new accumulated partial sums. (Cloutier Fig. 1, element 114, global bus, is clearly 
connected to the process (SIMD) controller 108, and further there is a program control 
or instruction bus 118 connected to the system 102) 

Firstly, the claimed "sample manager" is a vague term that encompasses any 
kind of controller that is operable to perform and direct logical operations, specifically 
convolution, across a processor, which is clearly fulfilled by each FPGA (which is 
divided into a number of processing elements), which in turn is directed by the process 
controller 108 in Fig. 1 of Cloutier. 

Secondly, each processor is connected to the global bus 114 and the instruction 
bus 118. The global bus 1 18 is clearly bidirectional, as illustrated by the arrows in the 
diagram of Figure 1. A bidirectional bus that connects all the processing elements 
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(PEs) in a toroidal configuration is further illustrated in Fig. 3 of Cloutier (that is, the 
structuring of the PEs for each FPGA). The claim recites two buses that receive and 
output partial sums. A bidirectional bus is functionally and structurally equivalent to two 
unidirectional buses, as it operates in both directions as recited by the claim, and further 
since it carries information between PEs it clearly would be a "partial sums" bus. 
Having unidirectional buses is.a trivially obvious modification and as such, the issue of 
having two buses is moot (and there are two buses anyway, and one of them could be 
made unidirectional to comply with the recited limitations as, again, is a trivially obvious 
variation if one wanted to achieve full-duplex transmission capabilities). 

Cloutier clearly establishes that as set forth above that each PE performs 
convolution based on weighted partial sum operations. Furthermore, the nature of 
convolution is such that once partial sums are computed, they must be acted on by 
other elements or processors to produce the final, desired results. 

Further, each FPGA is interconnected with each other as shown in Fig. 1 , and 
the PEs on each FPGA are connected as set forth in Fig. 3. The FPGAs and the PEs 
within each FPGA are all interconnected, and the PEs and the FPGAs can clearly be 
connected in a chain or serial fashion, as the very nature of an FPGA is that the blocks 
can be set to have any desired set of connections with bidirectional or unidirectional 
communications. 

As such, examiner takes the position that explanations cited above in response 
to each element of the portion of the claim dealing with the computation and/or 
calculation of partials sums more than adequately meet all of the limitations set forth by 
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that section of the claim. Further, any PE that received partials sums from the bus 
would clearly add them using the multiply-accumulate operations cited above, 
particularly in the case of a neural network that was being used to perform convolution, 
which would be obvious to do since the system of Cloutier clearly has established utility 
for performing both tasks, and optical character recognition (OCR), which requires 
convolution and pre-processing. Further, Cloutier clearly teaches the applicability of his 
system to image processing in 4:20-35. 

The references are clearly analogous art, as both perform convolution using 
weighted partial sums as established by the above citations. It would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
combine the systems of Cloutier with the system of Choi, as the system of Choi 
provides more detailed capabilities for each processor to perform more and more 
detailed graphics operations in general, and the processors of Choi could provide 
increased speeds for the architecture of Cloutier. 

16. As to claim 2, clearly all of the processing elements in Fig. 1 and Fig. 3 of 
Cloutier are clearly connected via the global bus anyway, which clearly meets the 
requirements that all the sample manager be chained, and that the final member 
calculates pixel values - clearly each chip is computing pixel values from the results of 
the prior one - see Cloutier 7:7-67. Further, as explained above in the rejection to claim 
1 , the FPGAs and the PEs within each FPGA are all interconnected, and the PEs and 
the FPGAs can clearly be connected in a chain or serial fashion, as the very nature of 
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an FPGA is that the blocks can be set to have any desired set of connections with 
bidirectional or unidirectional communications. 

17. Claims 3-7 are rejected under' 35 U.S.C. 103(a) as being unpatentable over 
Cloutier in view of Choi as applied to claims 1 and 2 above, and further in view of Inada 
et al (US 2004/0004620 A1)('lnada'). 

.18. As to claim 3, the system of claim 2, wherein for each sample manager the 
corresponding portion of samples resides in a sub-set of screen space and the sub-sets 
are finely interleaved across screen space. Clearly the system of Inada establishes in 
[0154] that the system breaks the screen down into blocks of 4x4 pixels for interleaving, 
which constitutes a "fine" distribution across screen space, and further in Fig. 1 it is 
shown how the screen is divided into smaller areas, where each area is analyzed for the 
presence of a primitive in the pixels in that particularly, smaller area. Further, Cloutier 
teaches the division of an image into blocks, as in 7:7-67 for processing and convolution 
purposes. That being said, It would have been obvious to one having ordinary skill in 
the art at the time the invention was made to combine the systems of Cloutier and Choi 
for the reasons set forth above (the motivation and combination of claim 2 are herein 
incorporated by reference) with the system of Inada, to allow interleaving as that 
technique speeds up drawing time (Inada [0155]). 

19. As to claim 4, the system of claim 3, comprising 16 sample managers, wherein 
each sample manager addresses one sample bin in a 4 by 4 array of sample bins that is 
repeated across screen space. 
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Reference Choi does not expressly teach this limitation, whilst Reference 
Cloutier teaches in 7:28-40 specifically that a matrix of 8x4 PEs is implemented on each 
FPGAs where there is a 2x2 array of FPGAs in the first place, and Inada provides 
additional support. Given this, it would be reasonable to use a 4x4 array instead of an 
8x4 array, given that each FPGA could easily be partitioned into a 4x4 array, as 
illustrated in Figure 3 anyway.. Now, as set forth in the rejections to claims 1 -3 above, . 
the system of Cloutier is taught for use with image convolution, and further in Inada the 
use of interlaced scans is taught, such that the screen is divided up into units of say 4x4 
pixels for faster drawing time. Therefore, if one FPGAs with a 4x4 array of PEs, or a 
2x2 array of FPGAs with a 2x2 PE implementation, with each one dedicated processing 
a certain portion of the screen was used, and interleaving was used for the results, it 
would logical to use the claimed 4x4 architecture. Motivation and combination is taken 
from the parent claim and herein incorporated by reference, with additional motivation 
as set forth in the immediately preceding paragraph. 

20. As to claim 5, this claim is a trivially obvious variant of claim 2; Cloutier Fig. 1 
clearly illustrates a plurality of FPGAs configured in a matrix connection, all with global 
bus connections, and Fig. 3 illustrates similar connections between PEs on one FPGA. 
Since only the primary reference is utilized, no separate motivation or combination is 
required and that from the rejection to the parent claim is herein incorporated by 
reference. 

21 . As to claim 6, the rejection from claim 2 is herein incorporated by reference in its 
entirety, as it addresses interleaving. Further, the system of Inada clearly teaches, as in 



\ 
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[0154-0155] and particularly in Fig. 1 the use of groups of 4x4 and 2x2 pixels 
respectively across the screen. Further evidence that this technique is common is 
provided by US 2003/01697274 to Oberoi et al, for example, where Fig. 7 clearly 
illustrates 2x2 tile bins with a plurality of samples in each bin as required above. 
Motivation / combination are taken from the previous claim and incorporated via 
. . .reference 

22. As to claim 7, this is a trivially obvious variant of claim 5, wherein the only 
difference from claim 2 is that the PE or FPGA or sample manager calculates pixel 
values from the final accumulated partial sums, which would obviously happen at the 
end of a convolution process for images or video in any case. 

23. Claims 1-2, 8, 11, 14-15, and 17-18 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Cloutier in view of Choi as applied to claim 1 above, and further in 
view of McCanny et al (US 4,885,71 5)('McCanny'). 

24. As to claim 1 , 

A system for distributed convolution of samples comprising: (Choi 2:10-20 
teaches convolution)(Cloutier 7:26-55)(McCanny 4: 1 3-33) 
-A sample manager operable to: 

-Calculate partial sums for a portion of the samples within a convolution kernel, 
wherein the partial sums comprise 1 ) a sum of weights determined for locations of the 
samples in the portion of samples and 2) a sum of weighted sample values for the 
portion of samples, (Choi teaches weighted partial sums are taught in 5:62-6:25, where 
the weights are based on position, e.g. the pixel location per line, which clearly 
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comprises a sum of weights determined for each position as explicitly set forth in 6:2- 
6:20, and the result of the weighted values are summed and put back into memory as 
set forth there)(Cloutier clearly establishes in 7:26-55 that the system has many 
processing element, each of which computes its own partial sums for convolution 
purposes)(McCanny in 4:13-33 clearly teaches the computation of partial sums wherein 
the sum of the weighted values is accumulated and passed along to the next processing 
elements) 

-Add the partial sums to any previously accumulated partial sums, and (Choi 
5:62-6:25 as set forth above)(Cloutier 7:26-55 again, where it is well known that the 
partial sums must be added, and 4:20-35, where it is taught that the present 
embodiment is well-suited for matrix and vector addition and multiplication. More 
specifically, the embodiment of Cloutier is taught to perform multiply-accumulate 
operations (8:50-9:15), which clearly requires that the network of processing elements 
perform multiply-accumulate operations per tile (with each processing element 
performing said operations), and in a neural network application, which provides feed- 
forward information (e.g. feedback) for pattern recognition and similar, multiply- 
accumulate operations used in the processing of an image would obviously be added 
and passed along, as the architecture of Cloutier as shown in Fig. 1 is such that data is 
passed along between elements in the additive fashion as set forth above. Also, the 
system performs convolution and uses partial sums, which clearly requires that the 
accumulated partial sums be passed to other elements.) (McCanny in 4:13-33 clearly 
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teaches the computation of partial sums wherein the sum of the weighted values is 
accumulated and passed along to the next processing elements) 

-Output new accumulated partial sums; (Given that the system of Choi produces 
graphical output (1 :40-55) for a monitor, it clearly outputs partial sum data that is 
eventually turned into video output; convolution using partially sums prima facie requires 
Jhat sucfxsums be output. )(The same logic, that the convolution of -Cloutier in 7:26-55 
and 8:50-9:15 clearly requires that the results of the computation and accumulated 
partial sums be output to other elements in the array so that computation can continue) 
(McCanny in 4:1 3-33 clearly teaches the computation of partial sums wherein the sum 
of the weighted values is accumulated and passed along to the next processing 
elements, clearly the values are output to the next element in the processing chain as 
set forth above) 

-A first partial sums bus connected to the sample manager for receiving any 
previously accumulated partial sums; and a (Cloutier Fig. 1, element 1 14, global bus, is 
clearly connected to the process (SIMD) controller 108, and further there is a program 
control or instruction bus 1 18 connected to the system 102)(McCanny, elements 16, 18, 
and 20 in Fig. 1 , wherein each processing element is connected in this fashion to the 
others) 

-A second partial sums bus connected to the sample manager for outputting the 
new accumulated partial sums. (Cloutier Fig. 1, element 114, global bus, is clearly 
connected to the process (SIMD) controller 108, and further there is a program control 
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or instruction bus 1 1 8 connected to the system 102) (McCanny, elements 16, 18, and 
20 in Fig. 1 , wherein each processing element is connected in this fashion to the others) 

Firstly, the claimed "sample manager" is a vague term that encompasses any 
kind of controller that is operable to perform and direct logical operations, specifically 
convolution, across a processor, which is clearly fulfilled by each FPGA (which is 
divided into. a number of processing elements), which, in turn is directed by the process 
controller 108 in Fig. 1 of Cloutier. 

Secondly, each processor is connected to the global bus 114 and the instruction 
bus 118. The global bus 1 18 is clearly bidirectional, as illustrated by the arrows in the 
diagram of Figure 1 . A bidirectional bus that connects all the processing elements 
(PEs) in a toroidal configuration is further illustrated in Fig. 3 of Cloutier (that is, the 
structuring of the PEs for each FPGA). The claim recites two buses that receive and 
output partial sums. A bidirectional bus is functionally and structurally equivalent to two 
unidirectional buses, as it operates in both directions as recited by the claim, and further 
since it carries information between PEs it clearly would be a "partial sums" bus. 
Having unidirectional buses is a trivially obvious modification and as such, the issue of 
having two buses is moot (and there are two buses anyway, and one of them could be 
made unidirectional to comply with the recited limitations as, again, is a trivially obvious 
variation if one wanted to achieve full-duplex transmission capabilities). 

Cloutier clearly establishes that as set forth above that each PE performs 
convolution based on weighted partial sum operations. Furthermore, the nature of 
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convolution is such that once partial sums are computed, they must be acted on by 
other elements or processors to produce the final, desired results. 

Further, each FPGA is interconnected with each other as shown in Fig. 1 , and 
the PEs on each FPGA are connected as set forth in Fig. 3. The FPGAs and the PEs 
within each FPGA are all interconnected, and the PEs and the FPGAs can clearly be 
connected in a chain or.serial fashion, as the very nature of an FPGA is thatthe blocks 
can be set to have any desired set of connections with bidirectional or unidirectional 
communications. McCanny clearly supports this limitation, as in both 4:13-33 and 
12:28-45 the use of such processors in cascaded configuration is taught. 

As such, examiner takes the position that explanations cited above in response 
to each element of the portion of the claim dealing with the computation and/or 
calculation of partials sums more than adequately meet all of the limitations set forth by 
that section of the claim. Further, any PE that received partials sums from the bus 
would clearly add them using the multiply-accumulate operations cited above, 
particularly in the case of a neural network that was being used to perform convolution, 
which would be obvious to do since the system of Cloutier clearly has established utility 
for performing both tasks, and optical character recognition (OCR), which requires 
convolution and pre-processing. Further, Cloutier clearly teaches the applicability of his 
system to image processing in 4:20-35. Further, McCanny clearly teaches all of the 
limitations of the above systems, wherein the chain of processing means connected 
together to perform the recited tasks are clearly set forth in 4:13-33, as well 12:28-45. 
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The references are clearly analogous art, as both perform convolution using 
weighted partial sums as established by the above citations. It would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
combine the systems of Cloutier with the systems of Choi and McCanny for the reasons 
set forth above, as the system of Choi provides more detailed capabilities for each 
processor to perform more and more detailed graphics operations in general, and the 
processors of Choi could provide increased speeds for the architecture of Cloutier and 
McCanny. 

25. As to claim 2, the claimed chain of sample managers corresponds merely to a 
chain of processors or processing elements capable of performing the recited functions 
of the sample manager would meet the limitations of the claim. References Cloutier 
and Choi implicitly teach these limitations, while reference McCanny clearly teaches this 
limitation in 4:13-33, where it is set forth that the processing elements accumulate the 
values and pass them down the chain as set forth above. 

26. As to claim 8, it is clearly merely a trivially obvious variant of claim 1 , with the 
word 'means' inserted. The 'means' recited are generic processing elements and thusly 
do not require any specialized function or structure to meet the recited limitations. 
Again, in means plus function cases the question is whether or not the applied 
reference performs the stated function whilst retaining at least similar structure. Both 
McCanny and Cloutier clearly teach arrays of processing elements for performing 
distributed convolution as recited by the claims, and taken together as taught in the 
above rejection to claim 1 clearly establishes that the claimed configuration is not 
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unique and is well known in the art. Again, applicant recites several embodiments in the 
specification. Further, the convolution kernel limitation is addressed in earlier 
paragraphs of the rejection to claim 1 above. Furthermore, the pixel limitation is 
addressed in the incorporation of the Choi reference, where the screen is divided into 
various sections - 1:1 - 2:67 where this is taught, and obviously the screen is divided 
for interlacing / interleaving as illustrated in Figs. 1 and 4 in Choi. Therefore, the applied, ..... 
references are valid vis-a-vis the various embodiments applicant has provided in the 
specification, in that the form and function of the arrays and their controllers as in 
McCanny and Cloutier clearly meet the recited limitations. 

27. As to claim 1 1 , the recited limitation is met by the primary reference. See Figure 
1 in Cloutier, where each FPGA 104 is taught to have its own memory 120, and the 
processing elements taught in Fig. 3 are merely partitioned versions of gates and 
memories, and the memory onboard each FPGA could clearly be divided into 4 sections 
(which is clearly known from the fundamentals of the art, regarding division into virtual 
and protected memories, kernel space, et cetera, which are well known techniques in 
embedded logic). Since only the primary reference is utilized, no separate motivation or 
combination is required and that from the rejection to the parent claim is herein 
incorporated by reference. 

28. As to claim 13, the recited limitation is not explicitly taught by Cloutier or 
McCanny, but is explicitly taught by Choi, wherein in Fig. 4, weighted sums from 
memories are shown with their respective weights. Further, the limitation for video 
output is met in element 200 et al in Fig. 9 where the VGA is put into element 200 and 



Application/Control Number: 1 0/673,087 Page 1 9 

Art Unit: 2672 

NTSC output data is generated. It would have been obvious to one having ordinary skill 
in the art at the time the invention was made to combine the systems of Cloutier with the 
systems of Choi and McCanny for the reasons set forth above, as the system of Choi 
provides more detailed capabilities for each processor to perform more and more 
detailed graphics operations in general, and the processors of Choi could provide 
increased speeds for the architecture of .Cloutier and McCanny. 
29. As to claim 14, this is a trivial variation of claim 8. The same rejection is valid 
upon it, specifically in that the same tasks are being performed as in the rejections to 
claims 1 and 8 above, and the McCanny reference clearly teaches a chain of 
processing elements while Cloutier teaches the use of processing elements / FPGAs 
having their own memories per element. As set forth above, the 'sample managers' are 
no more than controlling elements within each processor or FPGA, as the same 
functionality is being performed. Obviously, the partial sums would be sent to the next 
sample manager in the chain as clearly set forth in the nature of cascaded processors, 
as cited in the McCanny reference above. Motivation / combination are taken from the 
parent claim and incorporated by reference. The recited numeric limitations - that of N 
and k, are obvious in that any chain of processors would have each processor 
numbered as set forth in the claim, with the respective limitations, in that, for example, 
the first processor in the chain would be numbered zero, and of course the first 
processor would prima facie not receive data from a previous processor as it was the 
first one in the processor chain. 
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30. As to claim 15, each processor within the Cloutier reference contains its own 
memory, and it is known that each processor performs convolution on a set of samples, 
and that each FPGA utilizes the onboard memory (as illustrated in Cloutier Fig. 1) to 
provide the memory for the on-board PEs to do their calculations on their section of the 
image being convolved (this is fundamental to the operation of the systolic processing 
array set forth in Cloutier and also in McCanny, which merely proves that such 
techniques are obvious and well known). Since only the primary reference is utilized, 
no separate motivation or combination is required and that from the rejection to the 
parent claim is herein incorporated by reference. 

31 . As to claim 17, it is trivially obvious that a processing element would read from 
the samples stored in its own memory that would correspond to the section of 
convolution it was performing, as set forth in the explanations for the rejections set forth 
in the above paragraphs. Since only the primary reference is utilized, no separate 
motivation or combination is required and that from the rejection to the parent claim is 
herein incorporated by reference. 

32. As to claim 18, references McCanny and Cloutier do not expressly teach this 
limitation, whilst reference Choi does. Choi in Fig. 4 provides weighted sums from 
memories that are shown with their respective weights. Further, the limitation for video 
output is met in element 200 et al in Fig. 9 where the VGA is put into element 200 and 
NTSC output data is generated. It would have been obvious to one having ordinary skill 
in the art at the time the invention was made to combine the systems of Cloutier with the 
systems of Choi and McCanny for the reasons set forth above, as the system of Choi 
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provides more detailed capabilities for each processor to perform more and more 
detailed graphics operations in general, and the processors of Choi could provide 
increased speeds for the architecture of Cloutier and McCanny. Finally, the system of 
McCanny is used for convolution and signal processing, which is fundamentally part of 
image convolution, which Cloutier clearly performs as set forth in the rejections to 
.claims 1 and 8 above, where the specific locations of the citations in Cloutier that teach 
that limitation are located. 

33. Claims 21-26, 30, and 32-38 are rejected under 35 U.S.C. 103(a) as 
unpatentable over Cloutier in view of McCanny further in view of Inada. The rejections 
to claims 1 , 8, and 14 under Cloutier in view of McCanny are herein incorporated via 
reference in their entirety, while not incorporating any mention of the Choi reference. 
This is done because the explanation of the references takes several pages and has 
already been given; there is no purpose to reciting it again. 

34. As to claim 21 , firstly, the claimed filter unit as recited is clearly comparable to the 
sample managers recited in previous claims, as the functionality is the same, and the 
processing element would clearly be performing similar tasks. Each of the N memories 
recited is attached to a FPGA, which serves as a filtering element / sample manager / 
generic processor, as set forth in Cloutier Fig. 1 . Clearly the system of Inada 
establishes in [0154] that the system breaks the screen down into blocks of 4x4 pixels 
for interleaving, which constitutes a distribution across screen space, and further in Fig. 

1 it is shown how the screen is divided into smaller areas, where each area is analyzed 
for the presence of a primitive in the pixels in that particularly, smaller area, and in Fig. 7 
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the screen is shown to be divided into 2x2 bins or tiles containing samples for 
processing purposes. Further, Cloutier teaches the division of an image into blocks, as 
in 7:7-67 for processing and convolution purposes. 

Next, the recited N and k can clearly be 1 . Obviously, the system of Cloutier has 
one memory per FPGA and the system of Inada has a graphics memory 145 with a 
memory interface circuit .144 as shown in Fig. 2, where clearly this meets the recited 
limitations for the fact that N and k can be 1 . Cloutier clearly establishes processing / 
filtering units that have their own dedicated filter (see Fig. 1, the memory attached to 
each FPGA). McCanny in 4:13-33 clearly teaches the computation of partial sums 
wherein the sum of the weighted values is accumulated and passed along to the next 
processing elements, clearly the values are output to the next element in the processing 
chain as set forth above, also as set forth in 12:28-55. The recited numeric limitations - 
that of N and k, are obvious in that any chain of processors would have each processor 
numbered as set forth in the claim, with the respective limitations, in that, for example, 
the first processor in the chain would be numbered zero, and of course the first 
processor would prima facie not receive data from a previous processor as it was the 
first one in the processor chain. 

As set forth in the preceding paragraph, each FPGA in Cloutier obviously reads 
from its own memory that contains the section of the image assigned to (as in Inada) for 
convolution purposes, performs partial sum calculations on it, and moves it into the next 
element in the chain (McCanny). The entire question of partial sums and their 
calculations is covered in the sections of the rejections of claims 1 and 8 that have been 
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expressly incorporated via reference and will not be repeated for the purposes of 
brevity. 

One additional note is that the system of Inada as shown in Fig. 4 clearly shows 
a plethora of operations units attached to each register (e.g. operation units 141 1-1 , 
1412-1, etc. attached to registers such as 1411-2), which clearly establishes multiple 
processing / operations units attached to memories in the first place. ... 

Obviously, the system of Inada outputs pixels as set forth in paragraphs [0022- 
0024]. Clearly, the results of all the graphics calculations and convolutions would be 
passed out as pixel data as shown there, and it is logical that the end results of an 
image convolution calculation from a chain would indeed be output as pixels - indeed, 
an image is fundamentally composed of pixels, and it is a fundamental of the digital 
signal processing art that an image is output in pixels from being processed in this 
context. 

Motivation and combination is taken from the rejection to claims 8 and 14, which 
is incorporated by reference, and from the additional logic as set forth above. Further, It 
would have been obvious to one having ordinary skill in the art at the time the invention 
was made to combine the systems of Cloutier and McCanny as they both teach systolic 
arrays for convolution, and Cloutier incorporates the additional benefits of having 
FPGAs as processing nodes so that they can be reconfigured and Inada brings in the 
benefits of explaining how the screen space is subdivided so that a systolic array can 
thusly more efficiently parallel-process all the information provided from the subdivision 
of the screen space. 
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35. As to claim 22, Reference Cloutier teaches in 7:28-40 specifically that a matrix of 
8x4 PEs is implemented on each FPGAs where there is a 2x2 array of FPGAs in the 
first place, and Inada provides additional support. Given this, it would be reasonable to 
use a 4x4 array instead of an 8x4 array, given that each FPGA could easily be 
partitioned into a 4x4 array, as illustrated in Figure 3 anyway. Now, as set forth in the 
rejections to claims 1-3 above, the system of Cloutier is taught for use with image 
convolution, and further in Inada the use of interlaced scans is taught, such that the 
screen is divided up into units of say 4x4 pixels for faster drawing time. Therefore, if 
one FPGAs with a 4x4 array of PEs, or a 2x2 array of FPGAs with a 2x2 PE 
implementation, with each one dedicated processing a certain portion of the screen was 
used, and interleaving was used for the results, it would logical to use the claimed 4x4 
architecture. Motivation and combination is taken from the parent claim and herein 
incorporated by reference, with additional motivation as set forth in the immediately 
preceding paragraph. 

36. As to claim 23, it is a substantial duplicate of claim 21 under the circumstances 
where M is 1 . For other circumstances, obviously the system of Cloutier can be 
dynamically reconfigured to support any arrangement of processing elements, as it is 
composed of FPGAs that are fundamentally reprogrammable and can be controlled by 
the SIMD array element processor 108 in Cloutier Fig. 1. Therefore, division into 
groups is a trivially obvious variant. 

37. As to claim 24, the rejection from claim 2 is herein incorporated by reference in 
its entirety minus any sections involving the Choi reference, as it addresses interleaving. 
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Further, the system of Inada clearly teaches, as in [0154-0155] and particularly in Fig. 1 
the use of groups of 4x4 and 2x2 pixels respectively across the screen. Further 
evidence that this technique is common is provided by US 2003/01697274 to Oberoi et 
al, for example, where Fig. 7 clearly illustrates 2x2 tile bins with a plurality of samples in 
each bin as required above. Motivation / combination are taken from the previous claim 
and incorporated via reference. , Also, the rejection to claim 22 above addresses the_ . 
specific case of a 4x4 array and is also herein incorporated by reference. 

38. As to claim 25, it is merely a method implementing the system of claim 21 , and 
the rejection to claim 21 is valid upon it without further comment. 

39. As to claim 26, see the rejection to claim 22 above, which addresses regions 
having defined boundaries, wherein the screen space is divided asset forth there. Inada 
[0020] discusses how each region is judged with respect to its center point, which 
clearly establishes that this is an obvious variation. Motivation and combination are 
taken from the parent claim and incorporated herein by reference in their entirety. 

40. As to claim 30, this is an obvious variation and is addressed in the rejection to 
claim 21 and is repeated herein. Obviously, the system of Inada outputs pixels as set 
forth in paragraphs [0022-0024]. Clearly, the results of all the graphics calculations and 
convolutions would be passed out as pixel data as shown there, and it is logical that the 
end results of an image convolution calculation from a chain would indeed be output as 
pixels - indeed, an image is fundamentally composed of pixels, and it is a fundamental 
of the digital signal processing art that an image is output in pixels from being 
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processed in this context. Motivation and combination are incorporated by reference 
from the parent claim. 

41 . As to claim 32, Inada clearly addresses this limitation wherein it would be 
obvious to divide the screen up into bins and assign each one to an FPGA or 
processing element or filtering element, whatever the generic terminology for the 

. . . FPGAs of Cloutier or the processing units of McCanny. 

42. As to claim 33, Clearly the system of Inada establishes in [0154] that the system 
breaks the screen down into blocks of 4x4 pixels for interleaving, which constitutes a 
"fine" distribution across screen space, and further in Fig. 1 it is shown how the screen 
is divided into smaller areas, where each area is analyzed for the presence of a 
primitive in the pixels in that particularly, smaller area. Further, Cloutier teaches the 
division of an image into blocks, as in 7:7-67 for processing and convolution purposes. 
That being said, It would have been obvious to one having ordinary skill in the art at the 
time the invention was made to combine the systems of Cloutier and Choi for the 
reasons set forth above (the motivation and combination of claim 2 are herein 
incorporated by reference) with the system of Inada, to allow interleaving as that 
technique speeds up drawing time (Inada [0155]). 

43. As to claim 34, Reference Cloutier teaches in 7:28-40 specifically that a matrix of 
8x4 PEs is implemented on each FPGAs where there is a 2x2 array of FPGAs in the 
first place, and Inada provides additional support. Given this, it would be reasonable to 
use a 4x4 array instead of an 8x4 array, given that each FPGA could easily be 
partitioned into a 4x4 array, as illustrated in Figure 3 anyway. Now, as set forth in the 
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rejections to claims 1-3 above, the system of Cloutier is taught for use with image 
convolution, and further in Inada the use of interlaced scans is taught, such that the 
screen is divided up into units of say 4x4 pixels for faster drawing time. Therefore, if 
one FPGAs with a 4x4 array of PEs, or a 2x2 array of FPGAs with a 2x2 PE 
implementation, with each one dedicated processing a certain portion of the screen was 
used, and interleaving was used for the results, it would logical to use the claimed 4x4 
architecture. Motivation and combination is taken from the parent claim and herein 
incorporated by reference, with additional motivation as set forth in the immediately , 
preceding paragraph. 

44. As to claim 35, obviously the system of Cloutier can be dynamically reconfigured 
to support any arrangement of processing elements, as it is composed of FPGAs that 
are fundamentally reprogrammable and can be controlled by the SIMD array element 
processor 108 in Cloutier Fig. 1. Therefore, division into groups is a trivially obvious 
variant. 

45. As to claim 36, this is a trivially obvious variant of claim 30 is subject to the same 
rejection. 

46. As to claim 37, this is a trivially obvious variant of claim 33 and is subject to the 
same rejection. 

47. As to claim 38, the rejection from claim 24 is herein incorporated by reference in 
its entirety minus any sections involving the Choi reference, as it addresses interleaving. 
Further, the system of Inada clearly teaches, as in [0154-0155] and particularly in Fig. 1 
the use of groups of 4x4 and 2x2 pixels respectively across the screen. Further 
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evidence that this technique is common is provided by US 2003/01697274 to Oberoi et 
al, for example, where Fig. 7 clearly illustrates 2x2 tile bins with a plurality of samples in 
each bin as required above. Motivation / combination are taken from the previous claim 
and incorporated via reference. Also, the rejection to claim 22 above addresses the 
specific case of a 4x4 array and is also herein incorporated by reference. 

49. Claims 3-4 and. 10 are rejected under 35 U.S.C. 103(a) as being unpatentable . 
over Cloutier and Choi in view of McCanny as applied to claims 1 and 2 above, and 
further in view of Inada et al (US 2004/0004620 A1)('lnada'). 

50. As to claim 3, the system of claim 2, wherein for each sample manager the 
corresponding portion of samples resides in a sub-set of screen space and the sub-sets 
are finely interleaved across screen space. Clearly the system of Inada establishes in 
[0154] that the system breaks the screen down into blocks of 4x4 pixels for interleaving, 
which constitutes a "fine" distribution across screen space, and further in Fig. 1 it is 
shown how the screen is divided into smaller areas, where each area is analyzed for the 
presence of a primitive in the pixels in that particularly, smaller area. Further, Cloutier 
teaches the division of an image into blocks, as in 7:7-67 for processing and convolution 
purposes. That being said, It would have been obvious to one having ordinary skill in 
the art at the time the invention was made to combine the systems of Cloutier, 
McCanny, and Choi for the reasons set forth above (the motivation and combination of 
claim 2 are herein incorporated by reference) with the system of Inada, to allow 
interleaving as that technique speeds up drawing time (Inada [01 55]). 
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51 . As to claims 4 and 10, the system of claim 3, comprising 16 sample managers, 
wherein each sample manager addresses one sample bin in an array of sample bins 
that is repeated across screen space. Claim 10 merely recites the same language 
without the specific numeric limitations. 

Reference Choi does not expressly teach this limitation, whilst Reference 
. Cloutier teaches in 7:28t40 specifically, that a matrix of 8x4 PEs is implemented on each 
FPGAs where there is a 2x2 array of FPGAs in the first place, and Inada provides 
additional support. Given this, it would be reasonable to use an array of whatever 
desired size, say the 8x4 or 2x2 arrays as set forth above, given that each FPGA could 
easily be partitioned into a 4x4 array, as illustrated in Figure 3 anyway. Now, as set 
forth in the rejections to claims 1-3 above, the system of Cloutier is taught for use with 
image convolution, and further in Inada the use of interlaced scans is taught, such that 
the screen is divided up into units of say 4x4 pixels for faster drawing time. Therefore, if 
one FPGAs with a 4x4 array of PEs, or a 2x2 array of FPGAs with a 2x2 PE 
implementation, with each one dedicated processing a certain portion of the screen was 
used, and interleaving was used for the results, it would logical to use the claimed 4x4 
architecture. Motivation and combination is taken from the parent claim and herein 
incorporated by reference, with additional motivation as set forth in the immediately 
preceding paragraph. Motivation and combination for claim 10 takes the same form, 
given that the combination is based on the same logic without the limitations of claim3; 
however, the same logic (about the screen being divided into various sub-spaces) still 
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applies, given that the screen is explicitly divided as set forth in claim 10 anyway into 
various arrays of interleaved sample bins across the screen. 

52. Claim 27 is rejected under 35 U.S.C. 103(a) as being unpatentable over Cloutier 
in view of McCanny further in view of Inada as applied to claim 25 above, and further in 
view of Choi. 

... 53. As to claim 27, Choi teaches„weighted partial sums are taught in 5:62-6:25, 
where the weights are based on position, e.g. the pixel location per line, which clearly 
comprises a sum of weights determined for each position as explicitly set forth in 6:2- 
6:20, and the result of the weighted values are summed and put back into memory as 
set forth there. Motivation and combination are taken from claims 1 and 21 and herein 
incorporated by reference in their entirety. The motivations for combining Choi and 
Inada with Cloutier and McCanny are set forth in their respective segments and are 
mutually complementary and are not contradictory, and as such they together form a 
cohesive rationale for the combination of the four references. 

Allowable Subject Matter 

54. Claims 12, 16, 19-20, 28-29, and 31 would be allowable if rewritten to overcome 
the rejection(s) under 35 U.S.C. 112, 2nd paragraph, set forth in this Office action and 
to include all of the limitations of the base claim and any intervening claims. 

55. Claim 12 would be allowable because the prior art of record does not teach 
multiple means for rendering samples in combination with the other characteristics. 
Claim 16 would allowable because none of the references teach more than one memory 
attached to each rendering element, where the term "memory units" is taken by 
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definition to require multiple physical rather than logical memories. Claims 1 9-20 would 
be allowable because the prior art of record does not teach computing partial sums for 
each sample parameter values. Claims 28-29 would be allowable over the prior 
because sample values of transparency and color values are not taught by prior art 
references, nor is the enumerated listing of weight functions in claim 28. Claim 31 
— . . would be allowable because the prior. art does not.teach a reciprocal of the sum of the . 
weights for calculating such values. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Eric VWoods whose telephone number is 703-305- 
0263. The examiner can normally be reached on M-F 7:30-5:00 alternate Fridays off. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Michael Razavi can be reached on 703-305-4713. The fax phone number 
for the organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-2 1 7-91 97 (toll-free). 
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